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Abstract 



Bregman divergences are important distance measures that are used extensively in data-driven appli- 
cations such as computer vision, text mining, and speech processing, and are a key focus of interest in 
machine learning. Answering nearest neighbor (NN) queries under these measures is very important in 
these applications and has been the subject of extensive study, but is problematic because these distance 
measures lack metric properties like symmetry and the triangle inequality. 

In this paper, we present the first provably approximate nearest-neighbor (ANN) algorithms. These 
process queries in O(logn) time for Bregman divergences in fixed dimensional spaces. We also obtain 
polylogn bounds for a more abstract class of distance measures (containing Bregman divergences) which 
satisfy certain structural properties . Both of these bounds apply to both the regular asymmetric Bregman 
divergences as well as their symmetrized versions. To do so, we develop two geometric properties vital to 
our analysis: a reverse triangle inequality (RTI) and a relaxed triangle inequality called jX-defectiveness 
where is a domain-dependent parameter. Bregman divergences satisfy the RTI but not ;U -defectiveness. 
However, we show that the square root of a Bregman divergence does satisfy /x -defectiveness. This allows 
us to then utilize both properties in an efficient search data structure that follows the general two-stage 
paradigm of a ring-tree decomposition followed by a quad tree search used in previous near-neighbor 
algorithms for Euclidean space and spaces of bounded doubling dimension. 

Our first algorithm resolves a query for a c/-dimensional (1 -|-e)-ANN in O ^('*^)*'''''^ time and 

0(nlog''-^n) space and holds for generic /i-defective distance measures satisfying a RTI. Our second 
algorithm is more specific in analysis to the Bregman divergences and uses a further structural constant, 
the maximum ratio of second derivatives over each dimension of our domain (cq). This allows us to locate 
a (H- e)-ANN in O(logn) time and 0{n) space, where there is a further {cqY factor in the big-Oh for 
the query time. 



1 Introduction 

The nearest neighbor problem is one of the most extensively studied problems in data analysis. The past 
20 years has seen tremendous research into the problem of computing near neighbors efficiently as well as 
approximately in different kinds of metric spaces. 

An important application of the nearest-neighbor problem is in querying content databases (images, text, 
and audio databases, for example). In these applications, the notion of similarity is based on a distance metric 
that arises from information-theoretic or other considerations. Popular examples include the Kullback-Leibler 
divergence 1 16], the Itakura-Saito distance [20] and the Mahalanobis distance [27|. These distance measures 
are examples of a general class of divergences called the Bregman divergences [9], and this class has received 
much attention in the realm of machine learning, computer vision and other application domains. 

Bregman divergences possess a rich geometric structure but are not metrics in general, and are not even 
symmetric in most cases! While the geometry of Bregman divergences has been studied from a combinatorial 
perspective and for clustering, there have been no algorithms with provable guarantees for the fundamental 
problem of nearest-neighbor search. This is in contrast with extensive empirical study of Bregman-based 
near-neighbor searchlHl I33ll34ll36ll37l. 

In this paper we present the first provably approximate nearest-neighbor (ANN) algorithms for Bregman 
divergences. Our first algorithm processes queries in 0{log'^ n) time using O(Mlog'^n) space and only uses 
general properties of the underlying distance function (which includes Bregman divergences as a special 
case). The second algorithm processes queries in C?(log?i) time using 0{n) space and exploits structural 
constants associated specifically with Bregman divergences. An interesting feature of our algorithms is that 
they extend the "ring-tree -i- quad-tree" paradigm for ANN searching beyond Euclidean distances and metrics 
of bounded doubling dimension to distances that might not even be symmetric or satisfy a triangle inequality. 

1.1 Overview of Techniques 

At a high level |[35]| . low-dimensional Euclidean approximate near-neighbor search works as follows. The 
algorithm builds a quad-tree-like data structure to search the space efficiently at query time. Cells reduce 
exponentially in size, and so a careful application of the triangle inequality and some packing bounds allows 
us to bound the number of cells explored in terms of the "spread" of the point set (the ratio of the maximum to 
minimum distance). Next, terms involving the spread are eliminated by finding an initial crude approximation 
to the nearest neighbor. Since the resulting depth to explore is bounded by the logarithm of the ratio of the 
cell sizes, any c-approximation of the nearest neighbor results in a depth of 0(log(c/£)). A standard data 
structure that yields such a crude bound is the ring tree [26]. 

Unfortunately, these methods (which work also for doubling metrics 11411261 171) require two key properties: 
the existence of the triangle inequality, as well as packing bounds for fitting small-radius balls into large- 
radius balls. Bregman divergences in general are not symmetric and do not even satisfy a directed triangle 
inequality ! We note in passing that such problems do not occur for the exact nearest neighbor problem in 
constant dimension: this problem reduces to point location in a Voronoi diagram, and Bregman Voronoi 
diagrams possess the same combinatorial structure as Euclidean Voronoi diagrams lH . 

Reverse Triangle Inequality The first observation we make is that while Bregman divergences do not 
satisfy a triangle inequality, they satisfy a weak reverse triangle inequality: along a line, the sum of lengths 
of two contiguous intervals is always less than the length of the union. This immediately yields a packing 
bound: intuitively, we cannot pack too many disjoint intervals in a larger interval because their sum would 
then be too large, violating the reverse triangle inequality. 

;Li -defectiveness The second idea is to allow for a relaxed triangle inequality. We do so by defining a 
distance measure to be ^-defective w.r.t a given domain if there exists a fixed jJ. > I such that for all triples of 
points x,y,z , we have that \D{x,y) —D{x,z) \ < pLD{y,z)- This notion was first employed by Farago et.al |fl9l 
for an algorithm based on optimizing average case complexity. 
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A different natural way to relax the triangle inequality would be to show there exists a fixed /I < 1 such 
that for all triples {x,y,z), the inequality D{x,y) +D{y,z) > IJlD{x,z)- In fact, this is the notion of pL-similarity 
used by Ackermann et al p ] to cluster data under a Bregman divergence. However, this version of a relaxed 
triangle inequality is too weak for the nearest-neighbor problem, as we see in Figur^ 




Figure 1: The ratio ^ = /x, no matter how small c is 

Let g be a query point, cand be a point from P such that D(^,cand) is known and nn^, be the actual nearest 
neighbor to q. The principle of grid related machinery is that for D{q,miq) and D{q,cwA) sufficiently large, 
and D(cand,nng) sufficiently small, we can verify that Z)((7, cand) is a (1 + e) nearest neighbor, i.e we can 
short-circuit our grid. 

The figure [T] illustrates a case where this short-circuit may not be valid for /i -similarity. Note that pL- 
similarity is satisfied here for any c < 1 . Yet the ANN quality of cand, i.e, ^^'^^i^^^-!^ , need not be better than 
even for arbitrarily close nn^ and cand! This demonstrates the difficulty of naturally adapting the Ackermann 
notion of -similarity to finding a l+g nearest neighbor. 

In fact, the relevant relaxation of the triangle inequality that we require is slightly different. Rearranging 
terms, we instead require that there exist a parameter /i > 1 such that for all triples {x,y,z), \D{x,y) —D{x,z)\ < 
/iD(y,z). We call such a distance jJ. -defective. It is fairly straightforward to see that a -defective distance 
measure is also 2/(/i + l)-similar, but the converse does not hold, as the example above shows. 

Without loss of generality, assume that D{x,y) > D{x,z) > D{y,z)- Then D{x,y) —D{x,z) < IJ.D{y,z) and 
D{x,y) - D{y,z) < llD{x,z), so 2D{x,y) <{}X + l){D{x,z) +D{y,z)). Since D(x,y) is the greatest of the 
three distances, this inequality is the strongest and implies the corresponding 2/(jLt + 1) -similarity inequalities 
for the other two distances. 

Unfortunately, Bregman divergences do not satisfy -defectiveness for any size domain or value of }Jl\ 
One of our technical contributions is demonstrating in Section [4]that surprisingly, the square root of Bregman 
divergences does satisfy this property with pt depending on the boundedness of the domain and choice of 
divergence. 

A Generic Approximate Near-Neighbor Algorithm After establishing that Bregman divergences satisfy 
the reverse triangle inequality and jLt -defectiveness (Section]?]), we first show (Section[6]) that any distance 
measure satisfying the reverse triangle inequality, /i -defectiveness, and some mild technical conditions 
admits a ring-tree-based construction to obtain a weak near neighbor. However, applying it to a quad-tree 
construction creates a problem. The /i -defectiveness of a distance measure means that if we take a unit length 
interval and divide it into two parts, all we can expect is that each part has length between 1/2 and l/(/x + 1). 
This implies that while we may have to go down to level [log2 1'\ to guarantee that all cells have side length 
0{l), some cells might have side length as little as ^'"^C/^+i)^ weakening packing bounds considerably. 

We deal with this problem in two ways. For Bregman divergences, we can exploit geometric properties 
of the associated convex function (see Section [3]) to ensure that cells at a fixed level have bounded size 
(Section[8]); this is achieved by reexamining the second derivative 0". 

For more general abstract distances that satisfy the reverse triangle inequality and /^-defectiveness, we 
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instead construct a portion of the quad tree "on the fly" for each query (Section[7]l. While this is expensive, it 
still yields polylog(?i) bounds for the overall query time in fixed dimensions. Both of these algorithms rely on 
packing/covering bounds that we prove in Section [5] 

An important technical point is that for exposition and simplicity, we initially work with the symmetrized 
Bregman divergences (of the form Dsip{x,y) =D^{x\ y) +D^{y\ x)), and then extend these results to general 
Bregman divergences (Section |9]l. We note that the results for symmetrized Bregman divergences might be 
interesting in their own right, as they have also been used in applications [i33t ,341 l32l |30]| . 

2 Related Work 

Approximate nearest-neighbor algorithms come in two flavors: the high dimensional variety, where all bounds 
must be polynomial in the dimension d, and the constant-dimensional variety, where terms exponential 
in the dimension are permitted, but query times must be sublinear in n. In this paper, we focus on the 
constant-dimensional setting. The idea of using ring-trees appears in many works Il25ll26ll23l . and a good 
exposition of the general method can be found in Har-Peled's textbook |35 , Chapter 1 1]. 

The Bregman distances were first introduced by Bregman[9|. They are the unique divergences that satisfy 
certain axiom systems for distance measures ifTTl . and are key players in the theory of information geometry 
P ]. Bregman distances are used extensively in machine learning, where they have been used to unify boosting 
with different loss functions! 15] and unify different mixture-model density estimation problems |6|. A first 
study of the algorithmic geometry of Bregman divergences was performed by Nielsen, Nock and Boissonnat 
||8l . This was followed by a series of papers analyzing the behavior of clustering algorithms under Bregman 
divergences Ell2[Ill2ll[Il- 

Many heuristics have also been proposed for spaces endowed with Bregman divergences. Nielsen and Nock 
Bl j developed a Frank-Wolfe-like iterative scheme for finding minimum enclosing balls under Bregman 
divergences. Cayton lITOl proposed the first nearest-neighbor search strategy for Bregman divergences, based 
on a clever primal-dual branch and bound strategy. Zhang et al I371I developed another prune-and-search 
strategy that they argue is more scalable and uses operations better suited to use within a standard database 
system. For good broad reviews of near neighbor search in theory and practice, the reader is referred to the 
books by Har-Peled|[35l. Samet (241 and Shakhnarovich et al 1.29.1 . 

3 Definitions 

In this paper we study the approximate nearest neighbor problem for distance functions D: Given a point set P, 
a query point q, and an error parameter e, find a point nn^ G P such that D{rmq^q) < (1 + e)mmp(zpD{p,q). 
We start by defining general properties that we will require of our distance measures. In what follows, we 
will assume that the distance measure D is reflexive: D{x,y) = iff x = 3^. 

Definition 3.1 (Monotonicity). Let M cM, D:MxM— s-MZjea distance function, and let a,b,c € M where 
a < b < c. If the following are true for any such choice of a,b, and c: that < D{a,b) < D{a,c), that 
< D{b, c) < D(a,c), and that D(x,y) = iffx = y, then we say that D is monotonic. 

For a general distance function D : M y^M ^M., where M (lW\ we say that D is monotonic if it is 
monotonic when restricted to any subset of M parallel to a coordinate axis. 

Definition 3.2 (Reverse Triangle Inequality). Let M be a subset o/M. We say that a monotone distance 
measure D -.M xM satisfies a reverse triangle inequality or RTI if for any three elements a <b <c G M, 
D{a,b)+D{b,c) <D{a,c) 

Definition 3.3 (/^-defectiveness). Let D be a symmetric monotone distance measure satisfying the reverse 
triangle inequality. We say that D is -defective with respect to domain M if for all a,b,q €z M, 

\D{a,q)-D{b,q)\ < lxD{a,b) (3.1) 
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For an asymmetric distance measure D, we define left and right sided ^-defectiveness respectively as 

\D{q,a)-D{q,b)\ < }XD{a,b) (3.2) 

\D{a,q)-D{b,q)\ < l^D{b,a) (3.3) 

Note that by interchanging a and b and using the symmetry of the modulus sign, we can also rewrite left and 
right sided ^-defectiveness respectively as \D{q,a) —D{q,b)\ < lJ.D{b,a) and \D{a,q) —D(b,q)\ < lJ.D(a,b). 

Two technical notes. The distance functions under consideration are typically defined over M'^. We will 
assume in this paper that the distance D is decomposable: roughly, that D((xi , . . . ,Xd),{yi ,yd)) can be 
written as gi'E.ifixijyi)), where g and / are monotone. This captures all the Bregman divergences that are 
typically used (with the exception of the Mahalanobis distance). We will also need to compute the diameter 
of an axis parallel box of side length i. Our results hold as long as the diameter of such a box is 0{£d'^^^^): 
note that this captures standard distances like those induced by norms, as well as decomposable Bregman 
divergences. In what follows, we will mostly make use of the square root of a Bregman divergence, for which 
the diameter of a box is + l)d'i or Id^ , and so without loss of generality we will use this in our bounds. 

Bregman Divergences. Let : M C M'' — )• M be a strictly convex function that is differentiable in the 
relative interior of M. The Bregman divergence is defined as 

D^{x,y) = ^{x) -^{y)- {V^{y),x-y) (3.4) 

In general, is asymmetric. A symmetrized Bregman divergence can be defined by averaging: 

^^0(^,3') = ^{D^{x,y)+D^{y,x)) = ^{x -y,V^{x) -V^{y)) (3.5) 

An important subclass of Bregman divergences are the decomposable Bregman divergences. Suppose <p has 
domain M = nf=i and can be written as (x) = ^f^j (l>i{xi), where 0,- : M,- C M — )• M is also strictly convex 
and differentiable in relint(5;). Then D^{x,y) = ^f^j D^.{xi,yi) is a decomposable Bregman divergence. 

Most commonly used Bregman divergences are decomposable: [11, Chapter 3] illustrates some of the 
commonly used ones, including the Euclidean distance, the KL-divergence, and the Itakura-Saito distance . 
In this paper we will hence limit ourselves to considering decomposable distance measures. We note that 
due to the primal-dual relationship of D^{a,b) and D^*{b* ,a*), for our results on the asymmetric Bregman 
divergence we need only consider right-sided -defective distance measures. 

4 Properties of Bregman Divergences 

The previous section defined key properties that we desire of a distance function D. The Bregman divergences 
(or modifications thereof) satisfy the following properties, as can be shown by direct computation. 

Lemma 4.1. Any one-dimensional Bregman divergence is monotonic. 

Lemma 4.2. Any one-dimensional Bregman divergence satisfies the reverse triangle inequality. Let a <b <c 
be three points in the domain ofD^. Then it holds that: 

D^{a,b)+D^{b,c)<D^{a,c) (4.1) 
D^{c,b)+Dc^{b,a)<D^{c,a) (4.2) 

This is also true for Ds(^ and y^Ds^. 
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Note that this lemma can be extended similarly by induction to any series of n points between a and c. 
Further, using the relationship between D,j,{a,b) and the "dual" distance D^* {b*,a*), we can show that the 
reverse triangle inequality holds going "left" as well: D^{c,b) +D^{b,a) < D^{c,a). These two separate 
reverse triangle inequalities together yield the result for Dv^. The corresponding proof for -s/D^ merely 
requires some algebraic manipulation. 

For the remainder of this section, the proofs are straightforward but tedious, and hence we consign them to 
Appendix [a] While the Bregman divergences satisfy both monotonicity and the reverse triangle inequality, 
they are not jU -defective with respect to any domain! An easy example of this is which is also a Bregman 
divergence. A surprising fact however is that y^D^ and -y^D^ do satisfy /i -defectiveness (with jj. depending 
on the bounded size of our domain). While we were unable to show precise bounds for jj. in terms of the 
domain, the values are small. For example, for the symmetrized KL-divergence on the simplex where each 
coordinate is bounded between 0.1 and 0.9, jJ. is 1.22. If each coordinate is between 0.01 and 0.99,then jj. is 
2.42. 

Lemma 4.3. Given any interval I = [X1X2] on the real line, there exists a finite /i such that y^D^^ is }l-defective 
with respect to I, and y^O^ is both left and right-sided ^-defective with respect to I. 

We note that the result for y^Z)^ is proven by establishing the following relationship between D^{a,b) and 
D^{b,a) over a bounded interval / C M, and with some further computation. 

Lemma 4.4. Given a Bregman divergence and a bounded interval / C M, \/ [a^b^j \J {b,a) is 
bounded by a constant c^Ma^b ^1 where cq depends on the choice of divergence and interval. 

We extend our results to d dimensions naturally now by showing that if M is a domain such that ^/D^^ 
■sfD^ are /i-defective with respect to the projection of M onto each coordinate axis, then \fD7^ and 
are /^-defective with respect to all of M. 

5 Packing and Covering Bounds 

The aforementioned key properties (monotonicity, the reverse triangle inequality, decomposability, and 
;U -defectiveness) can be used to prove packing and covering bounds for a distance measure D. We now 
present some of these bounds. 

Lemma 5.1 (Interval packing). Consider a monotone distance measure D satisfying the reverse triangle 
inequality, an interval [ab] such that D{a,b) = s and a collection of disjoint inprovetervals intersecting [ab], 
where I = {[xx'] \ [xx'],D{x,x') > i}. Then \I\ <j+2. 

Proof. Let /' be the intervals of / that are totally contained in [ab] . The combined length of all intervals in /' 
is at most \I'\£, but by the reverse triangle inequality, their total length cannot exceed s, so |/'| < j. There can 
be only two members of / not in /', so |/| < | + 2. □ 

A simple greedy approach yields a constructive version of this lemma. 

Corollary 5.1. Given any two points, a <b on the line s.t D(a,b) = s, we can construct a packing of [ab] 
by r < ^ intervals [xjXi+i], I <i <r such that D{a,xo) = D{xi,Xi+i) = es, V/ and D{xr,b) < es. Here D is a 
monotone distance measure satisfying the reverse triangle inequality. 




Recall here that D^, D^ip and sjDs^ satisfy the conditions of Lemma 5.1 and corollary 5.1 as they satisfy an 
RTI and are decomposable. However, since y^D^ may not satisfy the reverse triangle inequality, we instead 
prove a weaker packing bound on ^JD^ by using . 

Lemma 5.2 (Weak interval packing). Given distance measure sfD^ and an interval \ab\ such that y^D^ (a,&) = 

s and a collection of disjoint intervals intersecting \ab\ where I = {[xx'] | [xx'], y/D^{x,x') > £}. Then 

2 

\1\ < §2 +2. Such a set of intervals can be explicitly constructed. 
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Proof. We note that here {a,b) = s^, and / = { [xx'] 



from lemma 5 . 1 since Dw, satisfies the conditions of lemma 5 . 1 



xx'],D0(x,x') > l^}. The result then follows trivially 

□ 



The above bounds can be generalized to higher dimensions to provide packing bounds for balls and cubes 
(which we define below) with respect to a monotone, decomposable distance measure. 



Definition 5.1. Given a collection of d intervals ai,bi , s.t D{ai,bi) 
dimensions is defined as n^L; [^ibi] and is said to have side length s. 



s where \ <i <d, the cube in d 



Lemma 5.3. Given a d dimensional cube B[ of side length s under distance measures D^, D^^ and y^D^, 
we can cover it with at most £^ cubes of side length exactly £s. In the case of sfD^, we can cover it with at 
most e^'^ cubes of side length es. 



satisfy conditions of corollary 5.1 Hence we can construct a gridding of 



Proof Note that D^^, , 

at most ^ points in each dimension spaced zs apart. We then take a product over all d dimensions, and the 
lemma follows trivially. For ^JD^, we refer to the RTI for and follow the same procedure, gridding by at 

□ 



most ^ points in each dimension, spaced £s apart. 
Lemma 5.4. Consider a ball B of radius s and center C with respect to a distance measure D. Then in the 

jd I 

case ofDs0 and it can be covered with — balls of radius des. In the case of y'Dsii,, B can be covered 



with balls of radius \/d£s. And for sjP^^ , B can be covered by balls of radius \J dzs. 

Proof. We divide the ball into 2^ orthants around the center c. Each orthant can be covered by a cube of size 
s. We now consider each case separately. For D^^ , and s/D^ , by lemma 



5.3 



each such cube can be 
1 



broken down into ^ sub-cubes of side length es. For y^D^ , we can break down each cube into ^ sub-cubes 
of side length es. 

For Dsij, we can trivially cover each sub-cube by a ball of radius des placed at any corner. Similarly, for 
^JD^, we can cover each sub-cube by a ball of radius \fdes placed at any comer. (This latter result follows 
by considering the sub-cube of side length es under yTX^ as one of side length e^s^ under Dg^ and placing a 
ball of radius de^s^ under D.^^ on any corner). 

We now consider the cases of and ^J~D^. For each orthant, we construct the gridding by Lemmas 
in each dimension for and 



5.1 



and 



5.2 



respectively. This gives us d sets of points X;, \ <i <d, 
where X, lies on the /-th axis passing through the center of the ball C. For each X,, we have an ordering (by 
construction) of points C,xa ,Xi2, • • •, s.t Z)(x;,x,+i) = es. Clearly every subcube is induced by the product of 
d pairs of points of the form {x,(^._i),x,„,,} where \ <i <d and m,- is some positive integer. Now to each 
subcube assign the lowest comer L^., defined as the product of the points x,(m._i), \ <i <d. The Bregman 
ball of radius des with center Lc will cover this subcube for the case D,^ , and the Bregman ball of radius 
^es with center Lc will cover this subcube for the case s/D^. Note that this argument will also extend to 
covering the cells of a quadtree produced by recursive decomposition, by a ball of required size placed on 
appropriate "lowest" corner. 

Since there are -X^ and sub-cubes to each orthant for \JD^ and respectively, the lemma now follows 
by covering each subcube with a Bregman ball of the required radius. □ 

6 Computing a rough approximation 

Armed with our packing and covering bounds, we now describe how to compute a C?(log n) rough approximate 
nearest-neighbor on our point set P, which we will use in the next section to find the (1 + e) -approximate 
nearest neighbor. The technique we use is based on ring separators. Ring separators are a fairly old concept 
in geometry, notable appearances of which include the landmark paper by Indyk and Motwani [i25il . Our 
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approach here is heavily influenced by Har-Peled and Mendel 112311 . and by Kiauthgamer and Lee 11261 . and 
our presentation is along the template of the textbook by Har-Peled f35. Chapter 11]. 

We note here that the constant of d^'^^ which appears in our final bounds for storage and query time is 
specific to ^yDs(j) . However, an argument on the same lines will yield a constant of d'^'^'^^ for any generic 
/I -defective, symmetric RTI-satisfying decomposable distance measure such that the diameter of a cube of 
side length 1 is bounded by d'^^^\ 

Let B{m, r) denote the ball of radius r centered at m, and let B'{m, r) denote the complement (or exterior) 
of B{m,r). A ring R is the difference of two concentric balls: R = B{m,r2) \ B{m,r\),r2 > r\. We will 
often refer to the larger ball B{m,r2) as Bout and the smaller ball as Bin- We use Pout(^) to denote the set 
PnBout, and similarly use Pin{R) as PPlBin, where we may drop the reference to R when the context is 
obvious. A t-ring separator Rpc on a point set P is a ring such that - < |Pin| < (1 — -)«,- < |Pout| < (1 — -)«, 
^2 > {^+t)ri and Bout \ ^in is empty. A t-ring tree is a binary tree obtained by repeated dispartition of our 
point set P using a f-ring separator. 

Note that later on in this section, we will abuse this notation slightly by using ring-separators where the 
annulus is not actually empty, but we will bound the added space complexity and tree depth introduced 

Finally, denote the minimum sized ball containing at least ^ points of P by Bopt^; its radius is denoted by 

''opt,c • 

We demonstrate that for any point set P, a ring separator exists and secondly, it can always be computed 
efficiently. Applying this "separator" recursively on our point structure yields a ring-tree structure for 
searching our point set. Before we proceed further, we need to establish some properties of disks under a 



/i -defective distance. Lemma 6.1 is immediate from the definition of -defectiveness. Lemma 6.2 is similar 
to one obtained by Har-Peled and Mazumdar ||22]| and the idea of repeating points in both children of a 
ring-separator derives from a result by Har-Peled and Mendel ll23l . 



Lemma 6.1. Let D be a jx-defective distance, and let B{m, r) be a ball with respect to D. Then for any two 
points x,y G B{m,r), D{x,y) < (/X + l)r. 

Lemma 6.2. Given a constant I <c <n , we can compute in 0{nc) randomized time a fi + \ approximation 
to the smallest radius ball containing " points. 

Proof. As described by Har Peled-Mazumdar ( l!22]l ') we let 5 be a random sample from P, generated by 
choosing every point of P with probability -. Next, compute for every p € S, the smallest disk centered at p 
containing c points. By median selection, this can be done in 0{n) time and since i'dS'l) = c, this gives us 



the expected running time of 0{nc). Now, let r' be the minimum radius computed. Note that by lemma 6.1 if 
l^nBopt^d > then we have that /<(/! + l)ropt. But since Bopt,c contains " points, we can upper bound the 
probability of failure as the probability that we do not select any of the ^ points in Bopt in our sample. Hence: 



Pr(|5nBopt,,| > 0) = 1 - (1 - -)? > 1 - - 

n e 

Note that one can obtain a similar approximation deterministically by brute force search, but this would 
incur a prohibitive 0{n-^) running time. □ 

We can now use Lemma [6j2] to construct our ring-separator. 

Lemma 6.3. For arbitrary t s.t I < t < n, we can construct a j-ring separator Rpc in 0{n) expected time on 
a point set P by repeating points. 



Proof. Using Lemma 6.2 we compute a ball 5 = B{m,r\) (where m £ P) containing - points such that 
n < + l)ropt,c where c is a constant to be set. Consider the ball S = B{m,2r\). We shall argue that 
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there must be " points of P in S' , for careful choices of c. As described in Lemma 5.4 S can be covered 
by 2^ hypercubes of side length 2ri, the union of which we shall refer to as H. Set L = (jj. + \)y/d. 
Imagine a partition of H into a grid, where each cell is of side-length ^ and hence of diameter at most 
A{'-^,d) = < ropt.c- A ball of radius ropt.c on any corner of a cell will contain the entire cell, and so it will 
contain at most " points, by the definition of ropt.c- 



By Lemma 



5.3 



the grid on H has at most 2 (2ri 



(4(At + l)VdY cells. Set c = 2(4(^4 + l)VdY. 
Then we have that S C H contains at most "(4(/x + \)Vd)'^ = ^ points. Since the inner ball S contains at 
least " points, and the outer ball S contains at most | points, hence the annulus S\S contains at most | — " 
points. Now, divide S\S into t rings of equal width, and by the pigeonhole principle at least one of these 
rings must contain at most points of P. Now let the inner ball corresponding to this ring be Bin and 
the outer ball be Bout and add these points to both children. Even for t = I, each child contains at most 



+ 1 



:i 



)n points. Also, the thickness of the ring is bounded by 



2ri-ri 



/2ri 



2t' 



i.e It IS 



aO(i) 

ring separator. Finally, we can check in 0{n) time if the randomized process of Lemma 6.2 succeeded simply 
by verifying the number of points in the inner and outer ring. □ 



Lemma 6.4. Given any point set P, we can construct a 0( ring- separator tree T of depth 0{d^{}X + 
lyiogn). 



Proof. Repeatedly partition P by lemma 6.2 into py^ and where v is the parent node. Store only the single 
point rep,, = m G P in node v, the center of the ball B{m, ri). We continue this partitioning until we have 
nodes with only a single point contained in them. Since each child contains at least - points (by proof of 

^ at each step, and hence the depth of the tree is 



Lemma 



6.3 1, each subset reduces by a factor of at least 1 



logarithmic. We calculate the depth more exactly, noting that in Lemma 
depth X can be bounded as: 



6.3 



0{d-^{li + \Y). Hence the 



(i-l)> 



1 

1 

n 



Ini 



-1 



ln(l 



In(l-i) 



\nn 



X < clnn = O (j2 + l)'^log?i 



□ 



Finally, we verify that the storage space require is not excessive. 



Lemma 6.5. To construct a 0(t-^) ring-separator tree requires Oin) storage and Oid^ ifX + lYnlogn) 
time. 



Proof. By Lemma 6.4 the depth bounds still hold upon repeating points. For storage, we have to bound the 
total number of points in our data structure after repetition, let us say Pr. Since each node corresponds to a 



splitting of P/;,there may be only 0{Pr) nodes and total storage. Note in the proof of Lemma 6.3 for a node 
containing x points, at most an additional may be duplicated in the two children. 

To bound this over each level of our tree, we sum across each node to obtain that the number of points 7] at 
the /-th level, as: 
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Note also by Lemma 6.4 the tree depth is 0{logn) or bounded by klogn where ^ is a constant. Hence we 
only need to bound the storage at the level / = C?(log?i). We solve the recurrence, noting that Tq = n and 
Tj > n for all / and hence 7]- < 7^_i (1 + j^). Thus the recurrence works out to: 

/ J \ 0{logn) / / ^ \ lognX 

Ti<n{l + T—] <M l + T— : <<e'). 



logn) "\\ logn 

Where the main algebraic step is that (1 + j)^ < e. This proves that the number of points, and hence our 
storage complexity is 0{n). Multiplying the depth by 0{n) for computing the smallest ball across nodes on 
each level, gives us the time complexity of 0{n\ogn). We note that other tradeoffs are available for different 
values of approximation quality (?) and construction time / query time. □ 

Algorithm and Quality Analysis Let best^ be the best candidate for nearest neighbor to q found so far and 
^near = D{bestq,q). Let nn^ be the exact nearest neighbor to q from point set P and Dgxact = D(nnq,q) be the 
exact nearest neighbor distance. Finally, let curr be the tree node currently being examined by our algorithm, 
and rep^,u„ be a representative point p G P of curr. By convention ry represents the radius of the inner ball 
associated with a node v, and within each node v we store rep,, = niy, which is the center of Bin(niv, r^,). The 
node associated with the inner ball Bin is denoted by Vin and the node associated with Bout is denoted by Vout- 

Lemma 6.6. Given a t-ring tree T for a point set with respect to a jx-defective distance D, we can find a 
0{}X + ^) nearest neighbor in 0{{}X + lYd^logn) time. 

Proof. Our search algorithm is a binary tree search. Whenever we reach node v, if D(rep,,,g) < Dnear set 
best^ = rep^, and Dnear = D(rep^,, as our current nearest neighbor and nearest neighbor distance respectively. 
Our branching criterion is that if D(repy,^) < (1 + ^r,,, we continue search in Vin, else we continue the 



search in Vout- Since the depth of the tree is Oilogn) by Lemma 6.4 this process will take O(logn) time. 

Turning now to quality, let w be the first node such that nn^^ G Win but we searched in Wout, or vice- versa. 
After examining rep^^„ Dnear < D(rep^^,,^) and Dnear can only decrease at each step. An upper bound on 
D(^,rep„,) /D{q,miq) yields a bound on the quality of the approximate nearest neighbor produced. In the first 
case, suppose nn^ G Win, but we searched in Wout- Then D(rep„,, ^) > (l + rvt. and D(repjj,,nn^) < r„. Now 
;Li -defectiveness implies that iJ.D{q,nnq) > D{iep^^,,q) — D(repjj,,nn^), so we have D(g,nn^) > j^rw. And 
for the upper bound on D (rep^^ ,q)/D{q, nriq ), we again apply /i -defectiveness to conclude that D (rep„, ,q) — 
D{q,nnq) < AtD(nn„rep,,), which yields < l+^j^ < 1 +Ai ^ = 1 +2^f . 

We now consider the other case. Suppose nn^ G Wout and we search in Wjn instead. By construction we must 
have D{rep„,q) < (l + 5) r„ and D(rep^^,nng) > (1 +t)r„. Again, -defectiveness yields D{q, miq) > jj^r^v. 

Now we can simply take the ratios of the two: ^^''^P""^^ < = n + ^. Taking an upper bound of the 

approximation provided by each case, the ring tree provides us a /i +2^ approximation. □ 

Corollary 6.1. Setting t = we can find a 0{jJL + 2jJL^\ogn) approximate nearest neighbor to a query 
point q in 0{d2 (/I + l)^log(?i)) time, using a 0( iog^„ ) ring separator tree. 

Proof. By Lemma [6!4{ Lemma [63] and Lemma [63] Note that we are slightly abusing notation in Lemma 
|6.3[ in that the separating ring obtained there is not empty of points of P as originally stipulated. However 
remember that if nn^^ is in the ring, then nn^ repeats in both children and cannot fall off the search path. Hence 



we can "pretend" the ring is empty as in our analysis in Lemma 6.6 □ 
7 Overall algorithm 

We give now our overall algorithm for obtaining a 1 + £ nearest neighbor in O ( ^ log^'' «) query time. 
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7.1 Preprocessing 



We first construct an improved ring-tree R on our point set P in 0{nlogn) time as described in Lemma 6.5 
with ring thickness 0{^^). We then compute an efficient orthogonal range reporting data structure on P in 

C?(?iIog'^'^^ n) time, such as that described in yj by Afshani et al. We note the main result we need: 

Lemma 7.1. We can compute a data structure from P with 0{n\og'^^^ n) storage (and same construction 
time), such that given an arbitrary axis parallel box B we can determine in 0(log^«) query time a point 

p£PnBif\PnB\ >o 

7.2 Query handling 

Given a query point q, we use R to obtain a point trough inOflogn) time such that Diough = ^(^>^rough) < 
(1 + lJ.^logn)D{q,rmq). Given trough, we can use Lemma 5.4 to find a family F of 2^ cubes of side length 



exactly Drough such that they cover B{q,Dmugh)- We use our range reporting structure to find a point p £ Pfor 
all non-empty cubes in F in a total of 2"^ log'^ n time. These points act as representatives of the cubes for what 
follows. Note that nn^ must necessarily be in one of these cubes, and hence there must be a (1 + £)-nearest 
neighbor qappmx G -P in some G G F. To locate this ^approx> we construct a quadtree [i35t Chapter 1 1] [18] for 
repeated bisection and search on each G G F. 

Algorithm [T] describes the overall procedure. We call the collection of all cells produced during the 
procedure a quadtree. We borrow the presentation in Har-Peled's book (35] with the important qualifier that 
we construct our quadtree at runtime. The terminology here is as introduced earlier in section [6] 

Algorithm 1 Query ApproxNN(F, root, 

Instantiate a queue Q containing all cells from F along with their representatives and enqueue root logw 

Let Dnear = D{rep^^^^,q), best^ = rep^oot 

repeat 

Pull off the head of the queue and place it in curr. 
if D{rep^^^.j.,q) < D{hestq,q) then 

Let best^ = rep^^^^, Dnear = D{hesty,q) 



Bisect curr according to procedure of Lemma 7.3 ; place the result in {G,}. 
for all G; do 



As described in 7.3 check if G, is non-empty by passing it to our range reporting structure, which 
will also return us some /? G P if G, is not empty. 

Also check if G,- may contain a point closer than (1 — |)Dnear to q. (This may be done in 0{d) time 
for each cell, given the coordinates of the comers.) 
if Gj is non-empty AND has a close enough point to q then 
Let repG_ = p 
Enqueue G, 
end if 
end for 
end if 
until Q is empty 
Return bestg 



Lemma 7.2. Algorithm^will always return a {1 + e)-approximate nearest neighbor. 

Proof. Let best^ be the point returned by the algorithm at the end of execution. By the method of the 
algorithm, for all points p for which the distance is directly evaluated, we have that D(best^,^) < D{p,q). 



10 



The terminology here is as in section [6] We look at points p which are not evaluated during the running of the 
algorithm, i.e. we did not expand their containing cells. But by the criterion of the algorithm for not expanding 
a cell, it must be that D(bestg,^)(l — f ) < D{p,q). For £ < 1, this means that (1 + e)D{p,q) > D(bestg,^) 
for any p ^ P, including nn^. So best^ is indeed a 1 + £ approximate nearest neighbor. □ 

We must analyze the time complexity of a single iteration of our algorithm, namely the complexity of a 
subdivision of a cube G and determining which of the 2'' subcells of G are non-empty. 

Lemma 7.3. Let G be a cube with maximum side length s and Gi its subcells produced by bisecting along 
each side of G. For all non-empty subcubes Gj of G, we can find pi G PDGi in 0(2'^ log"^ n) total time 
complexity, and the maximum side length of any Gi is at most |. 

Proof. Note that G is defined as a product of d intervals. For each interval, we can find an approximate 
bisecting point in 0(1) time and by the RTI each subinterval is of length at most |. This leads to an 0{d) 
cost to find a bisection point for all intervals, which define 0(2^) subcubes o r chi ldren. 



We pass each subcube of G to our range reporting structure. By lemma 7.1 this takes 0{\og'^ n) time 
to check emptiness or return a point pi € P contained in the child, if non-empty. Since there are 0(2'') 
non-empty children of G, this implies a cost of 2'^(log'^?i) time incurred. 

Checking each of the non-empty subcubes Gi to see if it may contain a point closer than ( 1 — f )£)near to q 
takes a further 0{d) time per cell or 0{d2'^) time. □ 

We now bound the number of cells that will be added to our search queue. We do so indirectly, by placing 
a lower bound on the maximum side length of all such cells. 

Lemma 7.4. AlgorithmUjwill not add the children of node C to our search queue if the maximum side length 
ofC is less than ^^2^'^ - 

Proof. Let A(C) represent the diameter of cell C. By construction, we can expand C only if some subcell of C 
contains a point 7? such that D(7?,^) < (1 — |)Dnear- Note that since C is examined, we have Dnear < ^(repc,^)- 
Now assuming we expand C, then we must have: 

AiA(C) >D(repc,^)-D(/7,^) >Dnear-(l-|)^)near= |£>neai- (7.1) 

So £/(2/x)Dnear < A(C). First note D{rep(^,q) < Dnear- Also, by definition, D{q,nnq) < Dnear- And 
A(C) < Vds where s is the maximum side length of C. Making the appropriate substitutions yields us our 
required bound. □ 



Given the bound on quadtree depth (Lemma 7.4 1, and using the fact that at most 2^^^ nodes are expanded at 
level X, we have: 



Lemma 7.5. Given a cube G of side length Drought we can compute a (1 + e)-nearest neighbor to q in 

d 

«* 

\D{q,n 



0(i72V^HnfeT) log'' n] time. 



Proof. Consider a quadtree search from ^ on a cube G of side length Drough- By lemma 7.4 our algorithm 



will not expand cells with all side lengths smaller than ^^^^^''^ ■ But since the side length reduces by at least 

half in each dimension upon each split, all side lengths are less than this value after x = log (trough /-^^^^^ 
repeated bisections of our root cube. 

Noting that 0(log^«) time is spent at each node by lemma 



7.3 



nodes expanded is 2^'^, we get a final time complexity bound of O ( -^2'^ jx'^d^ ( D(q'm\ ) 



and that at the x-th level the number of 

d . \ 
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Substituting Drough = M^log«£)(g,nn^) in Lemma 7.5 gives us a bound of O {l"^ ^ pL^'^ di log^'^nj . This 
time is per cube of F that covers B[q,D^oagh)- Noting that there are such cubes gives us a final time 
complexity of O (^2^'' jj.^'^ log^^n^ . For the space complexity of our run-time queue, observe that the 
number of nodes in our queue increases only if a node has more than one non-empty child, i.e, there is a 
split of our n points. Since our point set may only split n times, this gives us a bound of 0{n) on the space 
complexity of our queue. 

8 Logarithmic bounds, with further assumptions. 



For a given D^^, let cq = max,£[i ^] y^^^r^- We conjecture that cq = 0(/i) although we cannot prove 

it. In particular, we show that if we assume a bounded co (in addition to /i), we can obtain a 1 + £ nearest 

■ 1 \d 



neighbor in time 0(log?i + (^) ) time for y^Ds(i,. We do so by constructing a Euclidean quadtree T on our 



^ e 

set in preproccessing and using cq and /i to express the bounds obtained in terms of ^JD^. 

We will refer to the Euclidean distance I2 as De and note first the following key relation between ^JlX^ 
and Dg. 

Lemma 8.1. Suppose we are given a interval I = [X1X2] C M i.f. xi < X2, De{x\,X2) = r^, and \/T>s^{x\^X2) = 
r0. Suppose we divide I into m subintervals of equal length with endpoints xi = ao,ai,. . .a,n-i,am = X2, 
where a; < and De{ai,ai^i) = rjm, V/ G [0..m- 1]. Then ^ < y^D,^{ai,ai+i) < 

Proof Sketch: We can relate ^JD^ to via the Taylor expansion of y^D^: ^jD^t^ {a-,b) = \/ (l>"{x)De {a,b) 
for some x £ \ab\. Combining this with cn yields ; — > v; ^ = and ; — < 



Dg(a,,a,+ i) _ Co r-i 



y/D,^{xuX2) - ^-oDeixuXi) com y/D,^{xuX2) 



Corollary 8.1. If we recursively bisect an interval I = [X1X2] CMi.f. De{x\,X2) = r^ and \jT)s^{x\^X2) = r^ 
into 2' equal subintervals (under Dg), then ^ < \/Ds^ {ak,ak+i) < for any of the subintervals [a^ak^i] 
so obtained. Hence after log subdivisions, all intervals will be of length at most x under y^T)^. Also, 
given a cube of initial side length r^, after log repeated bisections (under De) the diameter will be at 
most \fdx under ^JW^. 

We find the smallest enclosing Bregman cube of side length s that bounds our point set, and then construct 
our compressed Euclidean quadtree in preprocessing. Corollary 1 8 . 1 1 gives us that for cells formed at the /-th 
level of decomposition, the side length under sjD^ is between ^ and Refer to these cells formed at 
the /-th level as L,-. 

Lemma 8.2. Given a ball B of radius r under y^D^, let i = log^. Then \L,-nB\ < 0(2^) and the side 
length of each cell in Li is between r and CQ^r under -s/D^. We can also explicitly retrieve the quadtree cells 
corresponding to |L;nS| in 0(2'^ log time. 

Proof. Note that for cells in L,, we have side lengths under ^JD^ between ^ and ^ by Corollary 
Substituting / = log these cells have side length between r and co^r under y/D^. By the reverse triangle 



8.1 



inequality and a similar argument to Lemma 5.4 we get our required bound for |L,- nS|. In preconstruction of 
our quadtree T we maintain for each dimension the corresponding interval quadtree T^, \/k G [l..d]. Observe 
this incurs at most 0{n) storage, with d in the big-Oh. For retrieving the actual cells |L, nB|, we first find 
the 0(1) intervals from level / in each T^ that may intersect B. Taking a product of these, we get 0(2'^) cells 
which are a superset of the canonical cells L, C T. Each cell may be looked up in C?(log?i) time from the 
compressed quadtree L35J so our overall retrieval time is 0(2^1og?i). □ 
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Given query point q, we first obtain in C?(Iog?i) time with our ring-tree a rough 0{n) ANN trough s.t. 



trough = a/A(|) trough) = l^^n\/Ds^{q^rmq). By Lemma 8.2 we have 0(2^) quadtree cells intersecting 

^(^) \/As0 trough))- 

Let us call this collection of cells Q. We then carry out a quadtree search on each element of Q. Note that 
we expand only cells which may contain a point nearer to query point q than the current best candidate. We 



bound the depth of our search using /i -defectiveness similar to Lemma 7.4 



Lemma 8.3. We will not expand cells of diameter less than 

^"^^2/1^ "" ^ ^^^^^ whose side-lengths w.r.t. y^Z5^ are less than . 

For what follows, refer to our spread as j8 — 



Lemma 8.4. We will only expand our tree to a depth of 
k = \og{2cQ^ pL^y/d / e). 



Proof. Using Lemma 8.3 and Corollary |8.1[ each cell of Q will be expanded only to a depth of k = 
log l^coco^Drough/^^^^lp^) . This gives us a depth of Xogilco" pL^Vd/ e). □ 



Lemma 8.5. The number of cells examined at the i-th level is n,- < ifx'^d^c^ + (^)^ j. 



cells is at most under y'TJ^, by Corollary 



8.1 



Proof. Recalhng that the cells of Q start with side length at most CpDrough, at the i-th level the diameter of 

Hence by -defectiveness, there must be some point 

examined by our algorithm at distance at most Dbest = Y^Oi0(<?,nn^) + ^ffov^pWii Note that our algorithm 
will only expand cells within this distance of q. 

The side-length of a cell C at this level is at least A(C) = ^J°^f . Applying the packing bounds from Lemma 



5.3 and the fact that {a-\-b) < 2 {a +b ), the number of cells expanded is at most 

Finally we add the «, to get the total number of nodes explored: 

= O {lfpi'^dicl^\og{2co^p^^/e) +2^''cl^^'^di /e'') . 

i 

Recalling that j8 = '^■""g'' = ^2^^ substituting back and ignoring lower order terms, the time complexity 



IS 



O (2^/1^^/24^ I0g?2 + 22^C^V''J5 /£^) . 



Accounting for the 2^^ cells in Q that we need to search, this adds a further 2'^ multiplicative factor. We 
note that compressed Euclidean quadtrees can be built in 0{n\ogn) time and require 0{n) space [35], which 
matches our bound for the ring-tree search phase of our algorithm requiring 0{n\ogn) time and 0{n) space. 
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9 The General Case: Asymmetric Divergences 

Without loss of generality we will focus on the right-sided nearest neighbor: given a point set P, query point 
q and £ > 0, find x € P that approximates mmpi=pD{p,q) to within a factor of (1 + e). Since a Bregman 
divergence is not in general -defective, we will consider instead sfD^: by monotonicity and with an 
appropriate choice of e, the result will carry over to D(^. 

We list three issues that have to be resolved to complete the algorithm. Firstly, because of asymmetry, we 
cannot bound the diameter of a quadtree cell C of side length s by s\fd. However, as the proof of Lemma 
5.4 shows, we can choose a canonical corner of a cell such that a (directed) ball of radius s\fd centered 



at that comer covers the cell. By -defectiveness, we can now conclude that the diameter of C is at most 
{\l + V)s\fd (note that this incurs an extra factor of + 1 in all expressions). Secondly, since while ^JD~^ 
satisfies -defectiveness (unlike D^) the opposite is true for the reverse triangle inequality, which is satisfied 



by but not This requires the use of a weaker packing bound based on Lemma 5.2 introducing 

dependence in 1/e^ instead of l/£. And thirdly, the lack of symmetry means we have to be careful of the use 
of directionality when proviing our bounds. 

Note that for this section, when we speak of assymetric /i -defective distance measure D, we are referring to 
■\JD^. With some small adjustments, similar bounds can be obtained for more generic asymmetric, monotone, 
decomposable and -defective distance measures satisfying packing bounds. The left-sided asymmetric 
nearest neighbor can be determined analogously. 

Finally, given a bounded domain D, we have that -s/D^ is left-sided /i -defective for some /i^ and right 



sided /I -defective for some \Xr (see Lemma A.4 for detailed proof). For what follows, let /i = max(/ii,/i«) 
and describe D as simply -defective. 

Most of the proofs here mirror their counterparts in Sections [6] and [7] 

9.1 Asymmetric ring- trees 

Since we focus on ng/zf-near-neighbors, all balls and ring separators referred to will use left-halls i.e balls 
B(m, r) = {x I D{m,x) <r}. As in Section|6j we will design a ring-separator algorithm and use that to build 
a ring-separator tree. 

Lemma 9.1. Let D be a jx-defective distance, and let B{m, r) be a left-ball with respect to D. Then for any 
two points x,y € B(m, r), D{x,y) < (/! + l)r. 



As in Lemma 6.2 we can construct (in 0{nc) expected time) a (/x + 1) -approximate left-ball enclosing 
" points. This in turn yields a ring-separator construction, and from it a ring tree with an extra {ji. + \ Yd^ 
factor in depth as compared to symmetric ring-trees ,due to the weaker packing bounds for ^/D^- 

We note that the asymptotic bounds for ring-tree storage and construction time follow from purely 
combinatorial arguments and hence are unchanged for sfD^- Once we have the ring- tree, we can use it as 
before to identify a rough near-neighbor for a query q; once again, exploiting /i -defectiveness gives us the 
desired approximation guarantee for the result. 

Lemma 9.2. Given any constant \ < c <n, we can compute in 0{nc) randomized time a left-ball B{m,r') 
such that / < (/X + lYopt^c and B{m,r') HP > ^. 



Proof. The proof is similar enough to Lemma 6.2 that we omit details here. □ 



Lemma 9.3. There exists a constant c ( which depends only on d and }X ), such that for any d-dimensional 
point set P and any jX-defective distance D, we can find a O(j^) left-ring separator Rpc- 

Proof. First, using our randomized construction, we compute a ball S = B{m, r\ ) (where m^P) containing 
" points such that r\ < { pL + l)ropt,c, where c is a constant to be set. Consider the ball S = B{m,2r[). As 



described in Lemma 5.4 S can be covered by 2 hypercubes of side length 2ri, the union of which we shall 
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refer to as H. Set L = (/x + 1 ) v (i. Imagine a partition of H into a grid, where each cell is of side-length ^ . 
Each cell in this grid can be covered by a ball of radius A(^,(i) = < ropt.c centered on it's lowest corner. 
This implies any cell will contain at most " points, by the definition of ropt,c-. 



By Lemma 5.3 the grid on H has at most 2'^ {2r\ / ^f'^ = (4(/i + l)\/t/)^'^' cells. Each cell may contain 
at most " points. In particular, set c = 2(4(/x + \)Vdf'^. Then we have that H may contain at most 
^(4(/i + l)v^)^^ = 2 points, or since S CH, S contains at most f points and S' contains at least j points. 



The rest of the proof goes through as in Lemma 6.3 □ 



We proceed now to the construction of our ring-tree using the basic ring-separator structure of Lemma 9.3 



Lemma 9.4. Given any point set P, we can construct a 0( left ring-separator tree T of depth 0{d'^{ix + 
l)2^1og?i). 



Proof. Repeatedly partition P by Lemma 9.3 into P>^ and where v is the parent node. Store only the 
single point rep,, = m £ P in node v, the center of the ball B{m, ri). We continue this partitioning until we 
have nodes with only a single point contained in them. 

Since each child contains at least - points, each subset reduces by a factor of at least 1 — - at each step. 



and hence the depth of the tree is logarithmic. We calculate the depth more exactly, noting that in Lemma 9.3 
c = 0{d'^{}Ji + if'^). Hence the depth x can be bounded as: 

„(i-ir = i 

c n 

Ini -1 

X = = 7-\nn 

In(l-i) In(l-i) 



X < c\nn = Old" in + l)^''log?i 



□ 



Note that Lemma 9.4 also serves to bound the query time of our data structure. We need only now bound 



the approximation quality. The derivation is similar to Lemma 6.6 but with some care about directionality. 



Lemma 9.5. Given a t-ring tree T for a point set with respect to a right-sided jX-defective distance D, we 



can find a 0{}X + ^) nearest neighbor 0{{ix + lYd^ log?i) time. 

Proof. Our search algorithm is a binary tree search. Whenever we reach node v, if D(rep,,,g) < Dnear set 
best^ = rep,, and Dnear = D(rep,,, ^) as our current nearest neighbor and nearest neighbor distance respectively. 
Our branching criterion is that if D(rep,,,^) < (1 + ^r,,, we continue search in Vin, else we continue the 



search in Vout- Since the depth of the tree is 0(log?i) by Lemma 9.4 this process will take 0(log?i) time. 
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Figure 2: qis outside (1 + j)?-^ so we search Wout, but nn^ G Win 

Let w be the first node such that nn^ € Win but we searched in Wout, or vice-versa. The analysis goes by 
cases. In the first case as seen in figure|2j suppose nn^ G Win, but we searched in Wout- Then 

D{Kp„,q) > (1 + 2) 
Z)(rep„,nn^) < r„,. 

Now left-sided jU-defectiveness implies a lower bound on the value of D{nnq,q): 

tJ.D{nng,q) > D{rep^.,q) -D(rep^,,nn^) 
IJ.D{nnq,q) > (^l+^-^r„-r„ 



D{nng,q) > —r„, 



t 

2ix' 

And for the upper bound on D(rep^^,^) /D(nn^,^). First by right-sided /i -defectiveness: 

D{r&p„,q)-D{nnq,q) < AtD(rep„„nn,^) 
D{rep„,q) < D {nn^, q) + ixr„ 

< 1+M- 



D(nn^,^) D(nn^,^) 
D{rep„,q) ^ ^ r„ 
D{nnq,q) {^r„ 

D{m).q,q) t 

D(nn^,^) t 

We now consider the other case. Suppose nn^ G Wout and we search in Win instead. The analysis is almost 
identical. By construction we must have: 

D(rep^,^) < (1 + ^ 
Z)(rep^,nn^) > {l+t)r^ 
Again, left-sided /i -defectiveness yields: 

D{m).q,q) > 
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We can simply take the ratios of the two: 

Taking an upper bound of the approximation quality provided by each case, we get that the ring separator 

2 

provides us a /i + 2^ rough approximation. □ 
9.2 Asymmetric quadtree decomposition 

As in Section |7] we use the approximate near-neighbor returned by the ring-separator-tree query to progres- 
sively expand cells, using a subdivide-and-search procedure similar to Algorithm[T] A key difference is the 
procedure used to bisect a cell. 

Lemma 9.6. Let G be a cube with maximum side length s and Gj its subcells produced by partitioning each 
side ofG into two equal intervals. For all non-empty subcubes Gj ofG, we can find pi ^PDGi in 0(2'^ log'' n) 
total time complexity, and the maximum side length of any Gi is at most 

Proof. Note that G is defined as a product of d intervals. For each interval, we can find an approximate 
bisecting point in 0(1) time. Here the bisection point x of interval [ab] is such that \/D^ (a,jc) = y^D^{x,b). 
By resorting to the RTI for D^, we get that D^{a,x) +D^{x,b) < s^ and hence D^{a,x) = D^{x,b) < ^ 
which implies y^D,j,{a,x) = y^D,j,{x,b) < The rest of our proof follows as in Lemma 



7.3 



□ 



We now bound the number of cells that will be added to our search queue. We do so indirectly, by placing 
a lower bound on the maximum side length of all such cells, and note that for the asymmetric case we get an 



additional factor of . 

Lemma 9.7. AlgorithmU\will not add the children of node C to our search queue if the maximum side length 
ofC is less than ^^^""i'^ 

Proof. Let A(C) represent the maximum distance between any two points of cell C. 

By construction, we can expand C only if some subcell of C contains a point p such that D{p,q) < 
(1 — |)Dnear- Notc that sincc C is examined, we have Dnear < D{rep(^,q). Now assuming we expand C, then 
we must have: 

D{KPc,q) - D{p,q) <nA{C) 

Onear " (1 " |)^)near < iUA(C) 
linear < iUA(C) 
^Dnear < A(C) 

Note that we substitute D(rep(-, q) < Dnear and that by the definition of Dnear as our candidate nearest neighbor 
distance, D(nn^,^) < Dnear- Our main modification from the symmetric case is that here A(C) < (/i + l)^/ds, 
where s is the maximum side length of C, as opposed to Vds. Since cel l C m ay be covered by a left-ball of 



radius \'ds placed at a suitably chosen comer (as explained in Lemma 5.4 1, lemma 9.1 gives the required 



bound on A(C) □ 



The main difference between this lemma and Lemma 7.4 is the extra factor of + 1 that we incur (as 



discussed) because of asymmetry. We only need do a little more work to obtain our final buonds: 



17 



Lemma 9.8. Given a cube G of side length Drough, <^nd letting x = -^2'^' {n + lyd'i i^j^j^—^^ we can 
compute a ( 1 + e)- right sided nearest neighbor to q in 0{x^ log^ n) time. 

Proof. Consider a quadtree search from g on a cube G of side length Drough- By lemma [977} our algorithm 
will not expand cells with all side lengths smaller than £D(nn^,^)/2/i(/i + \)y/d. But since the side 
length reduces by at least a factor of ^/l in each dimension upon each split, all side lengths are less 

than this value after k = \og^ ^2DroughM(M + \-)Vd / eD{miq,q)^ repeated bisections of our root cube. 

Observe now that C?(log'^?i) time is spent at each node by Lemma 9.6 , that at the ^-th level the number 
of nodes expanded is 2'^'^, and that log^?i = (log2?i)^. We then get a final time complexity bound of 

O ((l/£2'')22^V'''(At + ^f'^d" (Drough/D(nn„^))'''log^«). □ 



Substituting Drough = M^log(n)D(nn^,^) in Lemma 9.8 gives us a bound of O (2?-'' jj,^'' {jj, + l)'^'^ d'' log^'^ n) . 
This time is per cube of F that covers right-ball B{q,Dmugb)- Noting that there are 2^^ such cubes gives us a 
final time complexity of O {2?'^ ^ pi^'^ {pi + \f'^d'^\og^''n). The space bound follows as in Section [vj 

Logarithmic bounds for Asymmetric Bregman divergences We now extend our logarithmic bounds 
from Sectionjsjto asymmetric Bregman divergence a/D^. First note that the following Lemma goes through 



by identical argument to Lemma 8. 1 



Lemma 9.9. Suppose we are given an interval I = [x\X2\ C M i.f. xi < X2, De{x\,X2) = r^ and \/D^{x\,X2) = 
r^. Suppose we divide I into m subintervals of equal length with endpoints xi = ao < ai < . . . < a^-i < 
a,n = X2 where De{ai,ai+\) = r^jm, for all i G [0..m - 1]. Then ^ < y^D^{ai,ai+i) < 

Corollary 9.1. If we recursively bisect an interval I = [xiX2] C M s.t. De{x\,X2) = and \/D^{x]^^2) = 
into 2' equal subintervals (under Df,), then ^ < ^jD^{ak,ak+\) < for any of the subintervals [auak+i] 
so obtained. Hence after i = [log subdivisions, all intervals will be of length at most x. 

We now construct a compressed Euclidean quad tree as before, modifying the Section [8] analysis slightly to 
account for the weaker packing bounds for y^D^ and the extra /i + 1 factor on the diameter of a cell. 

Theorem 9.1. Given an asymmetric decomposable Bregman divergence D(j, that is pL-defective over a domain 
with associated cq as in Section^ we can compute a (1 + e) -approximate right-near-neighbor in time 



0({p. 



+ l)^^/flog« + (?22toM!^y/ 



We note our first new Lemma, a slightly modified packing bound due to y^D^ not having a direct RTI. 

Lemma 9.10. Given an interval [xiX2\ C M (xi,X2) = r > 0, and intervals with endpoints ao < 

ai < ... < ani-i < a,„, s.t. for all i G [0..m — 1], y^D^{ai,ai+i) > I, at most 0{^) such intervals intersect 
[X1X2]. 

Proof. By the Lagrange form, 

/ VD0(a,-,a,-+i) D,(a/,a/+i) 

- < — , — < Co , r— , (.y-^) 

r ^D^(x\,x2) De{xi,X2) 

or we can say that > ^. The RTI for D^ then gives us the required result. □ 

Corollary 9.2. Given a ball B of radius r under -sfD^, there can be at most Cq( j)'^ disjoint cubes that can 
intersect B where each cube has side length at least I under J~D^. 
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As before, we find the smallest enclosing Bregman cube of side length s that encloses our point set, and 
then construct a compressed Euclidean quad-tree in pre-processing. Let L, denote the cells at the i-th level. 

Lemma 9.11. Given a ball B of radius r under sfD^, let i = log p^. Then |L, nB| < 0{co^) and the side 



for 

length of each cell in Li is between r and CQ^r under ^JlX^. We can also explicitly retrieve the quadtree cells 
corresponding to |L, nB| in 0{cQ^\ogn) time. 



Proof. Note that for cells in L,-, we have side lengths between ^ and by Corollary 9. 1 Substituting 



/ = log these cells have side length between r and cq r under sjDs^. Now, we look in each dimension at 
the number of disjoint intervals of length at least r that can intersect B. By Lemma 9. 10 this is at most cq. 
The rest of the proof follows as in Lemma [8^ □ 



We first obtain in C?(log?i) time with our asymmetric ring-tree a n 0{n) ANN trough to query point q, such 

to get 0{c()') cells of our quadtree 



9.11 



that \/D^ (trough, ^) = O [}JL^n^^D^ {miq^q)). We then use Lemma ' 
that intersect right ball B y^D^ (trough, ^)) ■ 

Let us call this collection of cells as Q. We then carry out a quadtree search on each element of Q. Note 
that we expand only cells which may contain a point nearer to query point q than the current best candidate. 



We bound the depth of our search using -defectiveness similar to Lemma 8.4 



Lemma 9.12. We need only expand cells of diameter greater than ^^^2^""'' ~ 



Proof. By /i -defectiveness, similar to Lemma 7.4 



□ 



Corollary 9.3. We will not expand cells where the length of each side is less than x - 



Proof. Note that a quadtree cell C whose side length is less than x can be covered by a ball of radius \/dx 



under ^yD(p with appropriately chosen corner as center of ball, as explained in proof of Lemma 
y^Dip {a,b) < (jU + 1 ) Vdx, \/a,b gC. Substituting for x from Lemma 



Lemma 



9.1 



at most 



:-y/D0(nn^,g) 
2m 



9.12 



5.4 



Now by 



the diameter of C is 

□ 



Let the spread be j5 



0{ix^n). 



Lemma 9.13. We will only expand our tree to a depth ofk = log(2cQ/i(/x + \)p^/d/s). 
Lemma 9.14. The number of cells expanded at the i-th level is nt < 2'' {iJ.'^d^C(f + {-^Y) 
Proof. Recalling that the cells of Q start with side length at most c^Dj-ough, at the i-th level the side length 



of a cell C is at most ^"^7"'^'' under ^/Z^^ by Corollary 



9.1 



And using Lemma 



9.1 



Hence by /i -defectiveness there must be a point at distance at most Dbest = \/D,j, {nnq,q) + 



AC < ^/^(Ai + 1)%^. 



D„ 



The side length of a cell C at this level is at least , so the number of cells expanded is at most 



4i 



Ac 



iinilJ. + \)Vdc'^ + '^Y, by Corollary 9.2 Using the fact that {a + bY < 2'' {a'^ + b'^) , 

□ 



we get nt < 2^ (^^''{^ + lY^^-c^'' + 

Simply summing up all /, the total number of nodes explored is 

(9(2>''(Ai + \Ycl''log{2cl^pVd/e) +22'^c^>^(Ai + lYd'' /s'' 
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, or 

O + 1 Ycl^ log n + 22^cf,V'' (At + 1 J2 /e^) 

, after substituting back for jS and ignoring smaller terms. Recalling that there are Cq cells in Q adds a further 
Cg multiplicative factor. 

10 Further work 

A major open question is whether bounds independent of ^i-defectiveness can be obtained for the complexity 
of ANN-search under Bregman divergences. As we have seen, traditional grid based methods rely heavily on 
the triangle inequality and packing bounds, and there are technical difficulties in adapting other method such 
as cone decompositions lfT2l or approximate Voronoi diagrams 1,21,1 . We expect that we will need to exploit 
geometry of Bregman divergences more substantially. 
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A Proofs from Section |4] 

We present here full versions of proofs of -defectiveness and RTI from Section [4j 
A.l Proof of RTI for ^/D^ 

Lemma A.l. y^D^ satisfies the reverse triangle inequality. 

Proof. Fix a <x <b, and assume that the reverse triangle inequality does not hold: 



'Ds^{a,x) + ^Ds^{x,b) > ^Ds^{a,b) 

V(x- a)(0'(x) - f{a)) + y/{b-x)i^'{b) - (^'(x)) > ^/{b - a){<^'{b) - <^'{a)) 

Squaring both sides, we get: 

(x- a)(0'(x) - (/)'(«)) + {b-x){^'{b) - 

+2^J{x - a){b-x){^'{x) - (j)'{a)){(j)'{b) - (j)'{x)) >{b- a){(j)'{b) - (j)'{a)) 
(b - x)(0'(x) - + (x - a)i^'{b) - (/)'(x)) 



-2^y{x - a){b-x){<p'{x) - ^'{a)){f{b) - 0'(x)) < 
( -x)(0'(x) - 0'(a)) - Vi^-aWib) - (/)'(x)))' < 
which is a contradiction, since the LHS is a perfect square. □ 



A.2 Proof of Lemma 



4.3 



Lemma A.2. Given any interval I = [X1X2] on the real line, there exists a finite /i such that y^D^ is 
^-defective with respect to I. 

Proof. Consider three points a,b,q ^ I. 

Due to symmetry of the cases and conditions, there are three cases to consider: a < q <b,a <b < q and 
q<b <a. 
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Case 1: Here a<q <b. The following is trivially true by the monotonicity of s/D^ 



^jDs^{q,a)- ^Ds<i,{q,b) <^Ds,^{a,b) (A.l) 

Cases 2 and 3: For the remaining symmetric cases, a <b < q and q < b <a, note that since -s/Ds^ {q,(i) — 
y/Dsif) {q,b) and y^Ds(p{a,b) are both bounded, continuous functions on a compact domain (the interval 
[■^i-Xi]), we need only show that the following limit exists: 

J. \VDs^{q,a)-VD.^^{q,b)\ 
a^b ^D,tj,{a,b) 

First consider a <b <q, and we assume lim^,_j,a We will use the following substitutions repeatedly in 
our derivation: b = a+h, Mmh-^o <j>{a+h)= lim;,_>.o {<j>{a) + h^'{a) ) , and lim;,_>.o \/\+h = lim;,_>.o ( 1 + 
h/2). For ease of computation, we replace <j>'hy Xj/, to be restored at the last step. 

^. ^D,,^{a,q) - ^/D,^{b,q) _ ^"^"^b [^{q-a){w{q) - W{a)) - {q - b){\ir{q) - \if{b))) 
a-^b ^D,^{a,b) linia^f, ^y{b-a){Y{b)-\j/{a)) 



(A.3) 



Computing the denominator: 

lim {b — a){\i/{b) - \lf{a)) = lim a/ {a + h — a){\i/{a + h) - \if{a) 

b^a h^O 

= lim Jh{^f{a) + hY{a) — 

/i->0 

= lim s/ h{h\l/' {a)) = lirahy/Yia) 

We now address the numerator: 
Irni - a){Y{q) - ¥{a)) -\/{q- b){w{q) - W{b)) 

b^a 

= lim ^{q - a) {Y{q) - V{a)) -\/{q-a- h){Y{q) - V{a) - hY{a)) 



= Jim Viq - a){w{q) - W{a)) -^{q-a)(^l- {xif{q) - xi/{a 

= limv/(V^(,Fv^^ 



Dropping higher order terms of h, the above reduces to: 



lim h^/{xi/{q)-XI/{a)){q-a) ( ^ + ^^''^ ^ 
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Now combine numerator and denominator back in equation A.3 

lim/,^0 h y/{\l/{q)-Yia)){q-a^ ' ' 



lim 



+ 



Via) 



2(q-a) ^ 2(w{q)-W{a)) 



b^a y/Dstj,{a,b) 



lim/,^o/j\A/(«) 
1 



+ 



2{q-a) 2{Y{q)-Y{a)) 



if Y{q)-¥{a) j ¥'{a){q-a) 



Substituting back 0'(x) for we see that limit A.2 exists, provided is strictly convex: 



1/ / f(g)-f(a) r{a){q-a) 
The analysis follows symmetrically for case 3, where q < b <a. 



(A.4) 



□ 



We now show that right-sided -defectiveness holds for D^. To show this, we need to establish the 
following relationship between D0 {a,b) and Dip [b, a) over a bounded interval / C M. 

A.3 Proof of Lemma 



4.4 



Lemma A.3. Given a Bregman divergence and a bounded interval / C M , we have that ^jDip{a^h)l ^JD^^ {b,a) 
is bounded by some constant CQ\/a,b £ I where cq depends on the choice of divergence and interval. 

Proof. By continuity and compactness, over a finite interval / we have that cq = maX;^ </'/'(-^) / miny ^-'{y) is 
bounded. Now by using the Lagrange form of y^D^{a,b), we get that ^D^(a.,b)l y^D,j,{b,a) < ^/cq □ 

A.4 Proof that y^D^ is -defective 

Lemma A.4. Given any interval I = [X1X2] on the real line, there exists a finite ^ such that y^D^ is right-sided 
ll-defective with respect to I 



Proof. Consider any three points a,b,q € I. We will prove that there exists finite jJ. such that: 



D,j,{a,q)- JD^{b,q) 



<lxJD^{b,a) 



(A.5) 



Here there are now six cases to consider: a<q<b, b<q<a,a<b<q,b<a<q,q<b<a, and 
q < a < b . 



Case 1 and 2: Here a < q < b. By monotonicity we have that: 



D,j,{a, 



D,p{b,q) 



< JD,j,{a,b) + JD,j,{b,a 



(A.6) 



But by lemma 4.4 we have that y^Dtj,{a,b) < c-sjD^{b,a) for some constant c defined over /. This 
implies that | ^JD^{a,q) — ^jD^{b,q) \ / ^/D^ {b,a) < c + 1 , i.e, it is bounded over /. A similar analysis 
works for Case 2 where b <q <a. 
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Cases 3 and 4: For these two cases, a < b < q and b < a < q, note that since y/D^{q,a) — {q,b) and 
^yDij) {b,a) are both bounded, continuous functions on a compact domain (the interval [X1X2]), we need 
only show that the following limit exists: 

^.^ WD^ia,q)-VD^{b,q)\ 
a^b y/D^ib,a) 

First consider a < b < q, and we assume lim^^a. We will use the following substitutions repeatedly 
in our derivation: b = a + h, \mif,^Q^{a + h) = lim/j_j.o((^>(a) +h^'{a)), \mih^Q^{b) = <p{a + h) = 
lim/,^o(0('3) + h\j/{a) + ^"J^^M) and lim/,^o V^+h = lim/,^o(l +h/2). For ease of computation, we 
replace 0' by 1//, to be restored at the last step. 

^.^ y/D^{a,q)-^yD^{b,q) ^ ^.^ ^0 (a) - (g) - y(^) ja-q)- x/0 jb) -(jfjq)- Y{q) {b - q) 
a^b ^D^{b,a) a^b {b)-^{a)- ^{a) (b-a) 

(A.8) 

Computing the denominator: 



lim \/<^{b) — ^ (a) — ^{a) {b — a) = lim \U>{a) +h\j/{a) H — (a) — h^ia) 



/i^o V 2 



lim/,,''"'*"* 



ft^o V 2 

We now address the numerator: 

lim (y/^{a)-(l){q)-Y{q){a-q)- y^^{b) - ^{q) - Y{q){b - q) 

= lim ^0 {a) -^{q)- w{q) (a-q)- V <!> («) - (^) - V^C^) - ^) + /jlV^la) - V^(^) ) 

/n-O 

/i(i/A(a)-VA(^)) 



lim^D,(a,,)-JD,(a,,)(l+ ^^^^^^^ 



/2(VA(g)-i/A(a)) 



: lim i/D^fajfl) 1 — 4 / . ^ , , 

h{viq)-V{a)) 



■ lim \/D,i,(a,q) { I — { I , . 

_^^^ h{Y{q)-v{a)) 
h-^o 2^D^{a,q) 



Now combine numerator and denominator back in equation A.8 and note that D^{a,q) = | ( i//^' (x) ) (17 - 
c?)^, for some x G [ab]. 

^D^{a,q)-^D^{b,q) ^ l^D.ja,,) 
a^b ^D^{b,a) Yimn^oh^J^ 

^ iviq) - via)) Vv'{a) 
q-a ^/Yix) 
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Substituting back ^'{x) for V^(x), we see that limit A.7 exists, provided is strictly convex: 



(f(^)-f(fl))yrH 

q-a ^/(t)"ix) 



(A.9) 



The analysis follows symmetrically for case 4, by noting that lim^^^ = 1 and that ^jD^{a,q) ■ 



y^Dij,{b,q) = —{^yD,j,{b,q) — y^Dij){a,q)), i.e we may suitably interchange a and b. 
Cases 5 and 6: Here q < a <b or q <b < a. Looking more carefully at the analysis for cases 3 and 4, note 



that the ordering q<a<bvsa<b<q does not affect the magnitude of the expression for limit |A.7[ 
only the sign. Hence we can use the same analysis to prove /i-defectiveness for cases 5 and 6. □ 

Corollary A.l. Given any interval I = [x[X2] on the real line, there exists a finite }l such that y^D^ is 
left-sided IJ, -defective with respect to I 



Proof. Follows from similar computation. 



□ 



We extend our results to d dimensions naturally now by showing that if M is a domain such that y^Dstj, and 
y/Z)^ are ^u-defective with respect to the projection of M onto each coordinate axis, then -s/D^ and y^D^ 
are jU-defective with respect to all of M. 

A.5 Proof of /I -defectiveness in d dimensions 

Lemma A.5. Consider three points, a = (ai , . . . , a,-, . . . , aj), b = {bi, . . . ,bi, . . . , b^), q = {qi, . . . ,qi, . . . , qj) 
such that \y/D,^{ai,qi) - ^jDs^{bi,qi)\ < IJ.y/Ds,j,{ai,bi),y\ <i<d. Then 



D,^{a,q) - JD,^{b,q) 



<^jD,^{a,b) 



Similarly, if\^jD^ {ai,qi) -sjD^ {bi,qi) \ < ^^jD^{ai,bi)y\ <i<d. Then 



D^{a,q)- JD^{b,q) 



<lxJD^{b,a) 



(A. 10) 



(A.ll) 



Proof. 



Ds^{a,q)- JD,^{b,q) 



< tJ-\ Ds<i,{a,b) 



d 

I 

1=1 



Ds(i,{a,q) -^Ds^{b,q) - 2\/ Dstj,{a,q)D,^{b,q) < [1 D,tj,{a,b) 

d 

L 

i=l 



{Ds^{ai,qi) +Ds^{bi,qi)) - iJ D,^{a,q)D,^{b,q) < lX^Y^D,^{ai,bi 



52 [D,^{ai,qi) +Ds^{bi,qi) - lX^Ds^{ai,bi)) < iJ D,^{a,q)Ds,i,{b,q) 



i=l 



The last inequality is what we need to prove for /i -defectiveness with respect to a,b,q. By assumption we 
already have ^Li -defectiveness w.r.t each ai,bi,qi, for every 1 < / < <i: 



D,^{ai,qi) +D,^{bi,qi)- jj. Dsti,{ai,bi) < 2J Ds^{ai,qi)Ds,j,{bi,qi 



d 
I 

i=l 



52 (D,0 {ai,qi) + {bi,qi) - /l^D,^ (a,-, ^,)) < 2 V Av(|. {ai,qi)D,^ {bi,q^ 



d 

r 

i=l 
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So to complete our proof we need only show: 

d 



i=l 

But notice the following: 



^ J D,^ {ai,qi)J (fe,-, qi) < J D,^ {a,q) J D,^ {b, q) (A. 12) 



1 1 

2 / d / , \ 2\ 2 



Ds^{a,q) = \^^D,^{ai,qi)j = (y\J D,^{ai,qi)^ 



So inequality A. 12 is simply a form of the Cauchy-Schwarz inequality, which states that for two vectors u 
and V in M'^', that v)| < ||m|| ||v||, or that 



d 

UiVi 

i=i 



1 

2 / d ^ 
2 



< I"? Iv^ 



The second part of the proposition can be derived by an essentially identical argument. □ 
B Numerical arguments for bisection 

In our algorithms, we are required to bisect a given interval with respect to the distance measure D, as well as 
construct points that lie a fixed distance away from a given point. We note that in both these operations, we 
do not need exact answers: a constant factor approximation suffices to preserve all asymptotic bounds. In 
particular, our algorithms assume two procedures: 

1 . Given interval [ab] C M, find x G [ab] such that ( 1 — a) ^JDs^{a,x) < ^JDs^{x,b) < ( 1 + a) y^D^^ {a,x)to 
a lesser degree by 

2. Given ^ € M and distance r, find x s.t | \/Ds^ {q,x) — r\<ar 

For a given y^Z)^ : M — > M and precision parameter < a < 1 , we describe a procedure that yields an 
< a < 1 approximation in C?(log cq + log n + log ^ ) steps for both problems, where cq implicitly depends 
on the domain of convex function (p : 



Co = \ max max0"(x)/min0"(3') (B.l) 

Note that this implies linear convergence. While more involved numerical methods such as Newton's 
method may yield better results, our approximation algorithm serve as proof-of-concept that the numerical 
precision is not problematic. 

A careful adjustment of our NN-analysis now gives a O ^(log/x + logco + log ^) 2^''(1 + a^-^n^^di log^"^ n 

time complexity to compute a ( 1 + £)-ANN to query point q. 
We now describe some useful properties of D^^ . 

Lemma B.l. Consider y/D^ : M — )• M such that cq = maxj,. " (x) / min^, " (y ) . Then for any two intervals 

[xiX2], [X3X4] C M , 

1 IX1-X2I a/Av0(-^1,-^2) ^ IX1-X2I 

< , , ^ < CQ-. 7 (B.2) 



C0IX3-X4I ^D,.0(X3,X4) IX3-X4I 
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Proof. The lemma follows by the definition of cq and by direct computation from the Lagrange form of 
yjDs^ {a,b), i.e, ^/D^^JaJj) = ^J(^"{xah)\b - a\, for some Xah G [ab\. □ 

Lemma B.2. Given a point ^ E M, distance r € M, precision parameter < a < 1 and a ^-defective 
\jDs^ : M — 7- M, we can locate a point Xj such that \ y^Dg^ {Q,Xi) — r\<ar in C?(log ^ + log jj. + logco) time. 

Proof. Let x be the point such that ^jD^^i^q^ = r. We outline an iterative process, |2j with /-th iterate xi that 
converges to x. First note that "^^.^ < y^min-y <^"{y) and "^^^^^^ > '"^"-V'^ ^^ 1 _ \[ immediately follows that 



Algorithm 2 QueryApproxDist(^, r,co, a) 

Let a;o > ^ be such that ^^^^—^^ — [xq — q) = r 

Let step = (jco — q)/2 

repeat 

if Y^D,0 (^,x,) < r tiien 
;<c,+i = :<c; + step 

else 

Xi+\ =Xi-step 
end if 

step = step/2 
until I y/Ds,j,{q,Xi) -r\<ar 
Return x = xi 



r < ^JD,^{q,XQ) < clr. 

By construction, |;c, — ;c| < \xq — q\/2' . Hence by Lemma 



B.l 



defectiveness to upper bound our error | ■^Ds^{q,Xi) — y^Ds^{q,x)\ at the /-th iteration 



^JDsi^ {xi,x) < We now use jJ,- 



Ds,p {q,x) 



< 



Hclr 



Choosing / such that {jXcl)/!' < a implies that / < log ^ +logjU + 31ogco. 



(B.3) 

□ 



An almost identical procedure can locate an approximate bisection point of interval [ab] in C?(log/^ + 
log Co + log ^) time, and similar techniques can be applied for y^D^. We omit the details here. 
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